Search CORE

17 research outputs found

Multimodal Visual Concept Learning with Weakly Supervised Techniques

Author: Bouritsas Giorgos
Koutras Petros
Maragos Petros
Zlatintsi Athanasia
Publication venue
Publication date: 04/04/2018
Field of study

Despite the availability of a huge amount of video data accompanied by descriptive texts, it is not always easy to exploit the information contained in natural language in order to automatically recognize video concepts. Towards this goal, in this paper we use textual cues as means of supervision, introducing two weakly supervised techniques that extend the Multiple Instance Learning (MIL) framework: the Fuzzy Sets Multiple Instance Learning (FSMIL) and the Probabilistic Labels Multiple Instance Learning (PLMIL). The former encodes the spatio-temporal imprecision of the linguistic descriptions with Fuzzy Sets, while the latter models different interpretations of each description's semantics with Probabilistic Labels, both formulated through a convex optimization algorithm. In addition, we provide a novel technique to extract weak labels in the presence of complex semantics, that consists of semantic similarity computations. We evaluate our methods on two distinct problems, namely face and action recognition, in the challenging and realistic setting of movies accompanied by their screenplays, contained in the COGNIMUSE database. We show that, on both tasks, our method considerably outperforms a state-of-the-art weakly supervised approach, as well as other baselines.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

Pre-training Music Classification Models via Music Source Separation

Author: Garoufis Christos
Maragos Petros
Zlatintsi Athanasia
Publication venue
Publication date: 24/10/2023
Field of study

In this paper, we study whether music source separation can be used as a pre-training strategy for music representation learning, targeted at music classification tasks. To this end, we first pre-train U-Net networks under various music source separation objectives, such as the isolation of vocal or instrumental sources from a musical piece; afterwards, we attach a convolutional tail network to the pre-trained U-Net and jointly finetune the whole network. The features learned by the separation network are also propagated to the tail network through skip connections. Experimental results in two widely used and publicly available datasets indicate that pre-training the U-Nets with a music source separation objective can improve performance compared to both training the whole network from scratch and using the tail network as a standalone in two music classification tasks: music auto-tagging, when vocal separation is used, and music genre classification for the case of multi-source separation.Comment: 5 pages (4+references), 3 figures. ICASSP-24 submissio

arXiv.org e-Print Archive

Multi-Source Contrastive Learning from Musical Audio

Author: Garoufis Christos
Maragos Petros
Zlatintsi Athanasia
Publication venue
Publication date: 14/02/2023
Field of study

Contrastive learning constitutes an emerging branch of self-supervised learning that leverages large amounts of unlabeled data, by learning a latent space, where pairs of different views of the same sample are associated. In this paper, we propose musical source association as a pair generation strategy in the context of contrastive music representation learning. To this end, we modify COLA, a widely used contrastive learning audio framework, to learn to associate a song excerpt with a stochastically selected and automatically extracted vocal or instrumental source. We further introduce a novel modification to the contrastive loss to incorporate information about the existence or absence of specific sources. Our experimental evaluation in three different downstream tasks (music auto-tagging, instrument classification and music genre classification) using the publicly available Magna-Tag-A-Tune (MTAT) as a source dataset yields competitive results to existing literature methods, as well as faster network convergence. The results also show that this pre-training method can be steered towards specific features, according to the selected musical source, while also being dependent on the quality of the separated sources.Comment: 8 pages, 5 figures, 3 tables. (Slightly edited) submission at SMC2

arXiv.org e-Print Archive

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Music signal processing with application to recognition

Author: Zlatintsi Athanasia
Ζλατίντση Αθανασία
Publication venue: 'National Documentation Centre (EKT)'
Publication date: 01/01/2013
Field of study

This thesis lays in the area of signal processing and analysis of music signals using computational methods for the extraction of effective representations for automatic recognition. We explore and develop efficient algorithms using nonlinear methods for the analysis of the structure of music signals, which is of importance for their modeling. Our main research directions deals with the analysis of the structure and the characteristics of musical instruments in order to gain insight about their function and properties. We study the characteristics of the different genres of music. Finally, we evaluate the effectiveness of the proposed nonlinear models for the detection of perceptually important music and audio events.The approach we follow contributes to state-of-the-art technologies related to automatic computer-based recognition of musical signals and audio summarization, which nowadays are essential in everyday life. Because of the vast amount of music, audio and multimedia data in the web and our personal computers, the use of this study could be shown in applications such as automatic genre classification, automatic recognition of music’s basic structures, such as musical instruments, and audio content analysis for music and audio summarization.The above mentioned applications require robust solutions to information processing problems. Toward this goal, the development of efficient digital signal processing methods and the extraction of relevant features is of importance. In this thesis we propose such methods and algorithms for feature extraction with interesting results that render the descriptors of direct applicability. The proposed methods are applied on classification experiments illustrating that they can capture important aspects of music, such as the micro-variations of their structure. Descriptors based on macro-structures may reduce the complexity of the classification system, since satisfactory results can be achieved using simpler statistical models. Finally, the introduction of a ‘‘music’’ filterbank appears to be promising for automatic genre classification.Η διδακτορική αυτή έρευνα ασχολείται με το θέμα της ψηφιακής επεξεργασίας μουσικών σημάτων και την ανάλυσή τους με υπολογιστικές μεθόδους με στόχο την εξαγωγή χρήσιμης πληροφορίας για την αναγνώρισή τους. Συγκεκριμένα μελετάμε και αναπτύσσουμε αποτελεσματικούς αλγορίθμους, με τη χρήση μη-γραμμικών μοντέλων, για την επεξεργασία των σημάτων μουσικής, την κατανόηση μουσικών φαινομένων και τη μοντελοποίηση τους. Εστιάζουμε στη διερεύνηση και την ανάλυση των σχέσεων μεταξύ των μουσικών οργάνων για την κατανόηση της λειτουργίας και των χαρακτηριστικών τους. Εξετάζουμε τα γνωρίσματα των διαφορετικών ειδών μουσικής, ενώ επιπλέον αξιολογούμε την αποτελεσματικότητα των μη-γραμμικών μοντέλων για την ανίχνευση σημαντικών μουσικών και γενικά ακουστικών γεγονότων.Η ανάλυση αυτή συνεισφέρει στην έρευνα και στην τεχνολογία αιχμής που σχετίζεται με την αυτόματη κατηγοριοποίηση μουσικής μέσω των διαφορετικών αυτών πλαισίων, αλλά και στη δημιουργία περιλήψεων των ηχητικών σημάτων. Τέτοιες εφαρμογές στις μέρες μας συναντώνται ευρέως σε εφαρμογές από το λογισμικό υπολογιστών έως τα κινητά τηλέφωνα τρίτης γενιάς. Λόγω της πληθώρας των ηχητικών, μουσικών, αλλά και πολυμεσικών δεδομένων, η χρησιμότητα της μελέτης αυτής διαφαίνεται σε εφαρμογές όπως η αυτόματη αναζήτηση μουσικής με βάση το είδος, η αναγνώριση βασικών δομών της μουσικής, όπως για παράδειγμα τα μουσικά όργανα, και η δημιουργία περιλήψεων.Με βάση το πλαίσιο αυτό προτείνουμε νέα χαρακτηριστικά για τη μοντελοποίηση των σημάτων μουσικής. Η πειραματική αξιολόγηση τους τεκμηριώνει τη δυναμική των μεθόδων που ακολουθούμε καθώς τα αποτελέσματα παρουσιάζονται ιδιαίτερα ενθαρρυντικά. Συγκεκριμένα, η έρευνα αυτή δείχνει πως τα προτεινόμενα χαρακτηριστικά δύνανται να περιγράψουν σημαντικά φαινόμενα των μουσικών σημάτων όπως για παράδειγμα τις μικρο-μεταβολές των δομών τους. Επιπλέον, αναπαραστάσεις που βασίζονται στις μακροδομές των σημάτων επιφέρουν μείωση της πολυπλοκότητας του συστήματος κατηγοριοποίησης, εφόσον ικανοποιητικά αποτελέσματα επιτυγχάνονται με απλούστερα στατιστικά μοντέλα. Τέλος, η εισαγωγή ιδεών όπως η «μουσική» συστοιχία φίλτρων επιδεικνύει ιδιαίτερη διακριτική ικανότητα στην κατηγοριοποίηση των μουσικών σημάτων

Hellenic National Archive of Doctoral Dissertations

Musical instruments signal analysis and recognition using fractal features

Author: Athanasia Zlatintsi
Petros Maragos
Publication venue
Publication date: 01/01/2011
Field of study

Analyzing the structure of music signals at multiple time scales is of importance both for modeling music signals and their automatic computer-based recognition. In this paper we propose the multiscale fractal dimension profile as a descriptor useful to quantify the multiscale complexity of the music waveform. We have experimentally found that this descriptor can discriminate several aspects among different music instruments. We compare the descriptiveness of our features against that of Mel frequency cepstral coefficients (MFCCs) using both static and dynamic classifiers, such as Gaussian mixture models (GMMs) and hidden Markov models (HMMs). The methods and features proposed in this paper are promising for music signal analysis and of direct applicability in large-scale music classification tasks. 1

CiteSeerX

DSpace at NTUA